Op4dTensorGeneric kernel upgrade #3458

novakovicdj · 2025-01-03T14:03:38Z

This PR is for new, upgraded, Op4dTensorGeneric kernel, this is part of porting kernels from OCL to HIP

Below is performance (speed-up and drops in performance) comparison between new Op4dTensorGeneric kernel and other OpTensor kernels used for 4d tensors.

This PR is opened as draft for now, if everyone is ok with this new Op4dTensorGeneric kernel I will update this PR and replace old kernel with this new one.

Test cases generated and run from tensor_4d_generic_ocl_hip.cpp file, largest tensor is 128MB,

New Op4dTensorGeneric - Old OpTensorFwdBias (B - 1C11 case)

47502 test runs, float data type
On whole test set average speed-up is x15.06

Tensor size	Speed-up
size <= 32KB	1.31
32KB < size <= 4MB	8.5
size > 4MB	19.86

Performance drop	% of test runs
more than 5%	24.4
more than 10%	15.1
more than 20%	6.8

New Op4dTensorGeneric - Old OpTensorLeadingOnes (B - N111, NC11, NCH1, 1111)

190009 test runs, float data type
On whole test set average speed-up is x26.12

Tensor size	Speed-up
size <= 32KB	1.39
32KB < size <= 4MB	12.69
size > 4MB	35.49

Performance drop	% of test runs
more than 5%	12.1
more than 10%	9.3
more than 20%	5.3

New Op4dTensorGeneric - Old Op4dTensorLite (B - NCHW)

Tried on 2750 and 7280 test runs, float data type
On whole test set average speed-up is below 1 (~0.75)

New Op4dTensorGeneric - Old Op4dTensorGeneric (B - all cases)

760032 test runs, float data type
On whole test set average speed-up is x29.58

Tensor size	Speed-up
size <= 32KB	1.95
32KB < size <= 4MB	15.94
size > 4MB	39.39

Performance drop	% of test runs
more than 5%	3.1
more than 10%	1.8
more than 20%	0.4

…pen into Op4dTensorGeneric_to_HIP

… solver

novakovicdj · 2025-07-07T10:22:01Z

Tested again performance of these kernels but with other measurements, instead of time comparison I calculated useful calculations per second (GFLOPs) and bytes transferred to/from memory (GBs)

Tested with tests generated in tensor_4d_generic_ocl_hip.cpp file, only packed tensors, size from 32MB to 4GB per tensor, tested on gfx1030 (Radeon RX6800XT), comparison performed on applicable B tensor dimensions

Comparison with old Op4dTensorGeneric

	Old kernel	New kernel	Speed-up
GFLOPs	3.187	193.437	x60.7
GBs	11.135	494.846	x44.44

Comparison with Op4dTensorLite

	Op4dTensorLIte	New Op4dTensorGeneric	Speed-up
GFLOPs	138.89	116.723	x0.84
GBs	104.65	368.346	x3.52

Comparison with OpTensorFwdBias

	OpTensorFwdBias	New Op4dTensorGeneric	Speed-up
GFLOPs	60.1	204.468	x3.4
GBs	192.591	500.374	x2.6

Comparison with OpTensorLeadingOnes

	OpTensorLeadingOnes	New Op4dTensorGeneric	Speed-up
GFLOPs	116.485	209.944	x1.8
GBs	382.389	498.089	x1.3

BradPepersAMD · 2025-07-14T06:19:54Z

MIOpen is moving to the new monorepo setup and all older unmerged PR's are being closed. Please re-open this as part of the new repo if these changes are still needed.

novakovicdj and others added 14 commits November 19, 2024 15:05

initial changes

f5825b3

fixing parameters for Op4dTensorGeneric kernel

65a49f4

minor changes

a641ba0

Merge branch 'develop' into Op4dTensorGeneric_to_HIP

84d15b3

Merge branch 'develop' into Op4dTensorGeneric_to_HIP

66f7d48

minor changes for hip tidy

a02dac4

Merge branch 'Op4dTensorGeneric_to_HIP' of github.com:novakovicdj/MIO…

80bc390

…pen into Op4dTensorGeneric_to_HIP

new upgraded Op4dTensorGeneric kernel

c818deb

merge incoming changes into Op4dTensorGeneric_kernel_upgrade branch

999f9f9

fixed typo

831b263

changed mistake from merge

23a6700

Merge branch 'develop' into Op4dTensorGeneric_kernel_upgrade

dc0aff1

Merge branch 'develop' into Op4dTensorGeneric_kernel_upgrade

5668f52

removal of old 4d generic kernel and replacing it with new one in the…

1933db2

… solver

Merge branch 'develop' into Op4dTensorGeneric_kernel_upgrade

33ea215

novakovicdj marked this pull request as ready for review July 8, 2025 06:48

novakovicdj requested review from BrianHarrisonAMD, BradPepersAMD, adickin-amd and JonathanLichtnerAMD as code owners July 8, 2025 06:48

BradPepersAMD closed this Jul 14, 2025

dnovakovic-dxc mentioned this pull request Jul 18, 2025

Op4dTensorGeneric kernel upgrade #3890

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Op4dTensorGeneric kernel upgrade #3458

Op4dTensorGeneric kernel upgrade #3458

Uh oh!

novakovicdj commented Jan 3, 2025 •

edited

Loading

Uh oh!

novakovicdj commented Jul 7, 2025 •

edited

Loading

Uh oh!

BradPepersAMD commented Jul 14, 2025

Uh oh!

Uh oh!

Op4dTensorGeneric kernel upgrade #3458

Op4dTensorGeneric kernel upgrade #3458

Uh oh!

Conversation

novakovicdj commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

novakovicdj commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BradPepersAMD commented Jul 14, 2025

Uh oh!

Uh oh!

novakovicdj commented Jan 3, 2025 •

edited

Loading

novakovicdj commented Jul 7, 2025 •

edited

Loading